Back read-only MAP_SHARED file mappings with MAP_PRIVATE#84
Merged
Conversation
jserv
reviewed
Jun 6, 2026
jserv
requested changes
Jul 2, 2026
jserv
left a comment
Contributor
There was a problem hiding this comment.
Rebase onto the latest main branch, resolve any merge conflicts, and refine the changes based on the review feedback.
8dcfec9 to
c2aa4fa
Compare
A MAP_SHARED, PROT_READ mapping of a file opened O_RDONLY is common -- the JVM maps its ~135 MiB lib/modules image exactly this way. That case is already fixed on main by 520568c ("Harden runtime around foot"), which routes non-writable-fd MAP_SHARED requests through the pread-snapshot fallback in sys_mmap's non-fixed path instead of installing a live overlay. That fallback fires for any MAP_SHARED request the overlay's overlay_fd_writable() gate would reject, without checking the guest's requested prot. Two Linux-visible corners were left open as a result: - mmap(MAP_SHARED, PROT_WRITE) of an O_RDONLY fd silently succeeded via the fallback instead of failing EACCES. Fixed by checking overlay_fd_writable() before falling through to pread, rolling back the allocation and returning EACCES when the guest asked for PROT_WRITE against a non-writable backing fd. - Once a read-only MAP_SHARED mapping succeeded, nothing stopped a follow-up mprotect(PROT_READ | PROT_WRITE) from upgrading it. Linux tracks max_prot per VMA from the fd's open mode and rejects that upgrade with EACCES; sys_mprotect only consulted prot_to_perms() and happily granted it, so a subsequent guest write landed in guest-local memory with no error ever surfaced to the caller. Fixed by adding guest_region_t.backing_ro, set on a MAP_SHARED region whenever its backing_fd lacks write access (the same overlay_fd_writable() check), threaded through regions_mergeable (so two regions with different backing_ro never silently coalesce), region_snapshot_t capture/restore, and all three sys_mremap region-recreation sites. sys_mprotect now rejects a PROT_WRITE request over any MAP_SHARED region with backing_ro set, before doing any PTE work. test-mmap-shared-ro covers the O_RDONLY read path (already fixed by 520568c), a second concurrent read-only mapping, the O_RDONLY mmap(PROT_WRITE) rejection, the mprotect(PROT_WRITE) upgrade rejection, and the read-only-mapping-on-O_RDWR-fd branch. NPAGES is bumped from 64 (256 KiB, fits in one 2 MiB HVF segment) to 768 (3 MiB, crosses a segment boundary) so the cases actually exercise hvf_segment_split's multi-block path.
c2aa4fa to
bdde276
Compare
Contributor
|
Thank @Max042004 for contributing! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A MAP_SHARED, PROT_READ mapping of a file opened O_RDONLY could never be
installed. hvf_apply_file_overlay_quiesced() always mmap'd the host page
PROT_READ|PROT_WRITE and mapped the HVF segment RWX. On a read-only fd the
host mmap fails with EACCES (writable mapping of an O_RDONLY fd); forcing
PROT_READ then trips hv_vm_map(), because a MAP_SHARED mapping of an
O_RDONLY fd has macOS max_protection=READ and HVF cannot grant stage-2
rights (RWX) beyond the host region's max_protection (HV_ERROR).
This blocked every workload that maps a read-only file MAP_SHARED -- most
visibly the JVM, which maps its ~135 MiB lib/modules image exactly this
way and crashed on startup.
Choose the host backing from what the fd and the guest actually need:
the file; an O_RDONLY fd still yields EACCES, matching Linux).
is RWX, so the segment maps and cross-mapping coherence is preserved).
max_protection is RWX so the segment maps; the pages still show file
content, and the guest's stage-1 tables keep the region read-only so
the private copy is never dirtied -- no observable MAP_SHARED
divergence for a read-only mapping.
The guest-requested prot is threaded through hvf_apply_file_overlay(),
hvf_apply_file_overlay_quiesced(), and restore_file_overlay_range() so
every overlay install/restore site picks the correct backing.
Add test-mmap-shared-ro covering the O_RDONLY read path, a second
concurrent read-only mapping, EACCES on a writable request, and the
read-only-mapping-on-O_RDWR-fd branch.
(cherry picked from commit 337d39a4313109884112a86a0c4147bddfe18fa1)
Summary by cubic
Fixes read-only MAP_SHARED mappings of O_RDONLY files by using MAP_PRIVATE when needed, returns EACCES for MAP_SHARED|PROT_WRITE on read-only fds, and blocks mprotect(PROT_WRITE) upgrades to preserve Linux max_prot semantics. This unblocks JVM lib/modules and matches Linux behavior.
Bug Fixes
Tests
test-mmap-shared-rocovering: read-only MAP_SHARED on O_RDONLY, concurrent read-only mappings, EACCES on writable request, read-only mapping on an O_RDWR fd, and EACCES on mprotect(PROT_WRITE).Written for commit bdde276. Summary will update on new commits.